{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Building Gradient Agreement Algorithm with MAML"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last section we saw how Gradient agreement algorithm works. We saw how gradient agreement adds weights to the gradients implying their importance.\n",
    "\n",
    "\n",
    "Now we will see how to use our gradient agreement algorithm MAML by coding them from scratch with just numpy. For better understanding, we consider a simple binary classification task. We randomly generate our input data and we train them with a simple single layer neural network and try to find the optimal parameter theta. \n",
    "\n",
    "Now we will step by step of how exactly we are doing this,\n",
    "\n",
    "First we import all the necessary libraries,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Data Points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we define a function called sample_points for generating our input (x,y) pairs. It takes the parameter k as an input which implies number of (x,y) pairs we want to sample. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sample_points(k):\n",
    "    x = np.random.rand(k,50)\n",
    "    y = np.random.choice([0, 1], size=k, p=[.5, .5]).reshape([-1,1])\n",
    "    return x,y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above function returns output as follows, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.86213006 0.85488908 0.73205517 0.43742195 0.13757096 0.89868047\n",
      " 0.63167011 0.34247553 0.3392788  0.03710747 0.89854446 0.35188684\n",
      " 0.49703317 0.38018055 0.7631458  0.96567766 0.79066254 0.97301861\n",
      " 0.92213652 0.76549118 0.51386957 0.12254211 0.02287478 0.54356945\n",
      " 0.34627381 0.15294395 0.55542084 0.2456739  0.89171505 0.89245352\n",
      " 0.62508429 0.34307283 0.46678738 0.46477777 0.41564971 0.67397622\n",
      " 0.46779937 0.12663504 0.04662468 0.45111941 0.05543247 0.16493359\n",
      " 0.15365002 0.31825373 0.42284395 0.40723585 0.6032624  0.22342533\n",
      " 0.23117682 0.18153011]\n",
      "[0]\n"
     ]
    }
   ],
   "source": [
    "x, y = sample_points(10)\n",
    "print x[0]\n",
    "print y[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Single Layer Neural Network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For simplicity and better understand, we use a neural network with only single layer for predicting the output. i.e,\n",
    "\n",
    "a = np.matmul(X, theta)\n",
    "\n",
    "YHat = sigmoid(a)\n",
    "\n",
    "\n",
    "\n",
    "__*So, we use gradient agreement with MAML for finding this optimal parameter value theta that is generalizable across tasks. So that for a new task, we can learn from a few data points in a lesser time by taking very less gradient steps.*__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient Agreement with MAML"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we define a class called GradientAgreement_MAML where we implement the Gradient Agreement MAML algorithm. In the \\__init__  method we will initialize all the necessary variables. Then we define our sigmoid activation function. Followed by we define our train function. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can check the comments written above each line of code for understanding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class GradientAgreement_MAML(object):\n",
    "    def __init__(self):\n",
    "        \n",
    "        #initialize number of tasks i.e number of tasks we need in each batch of tasks\n",
    "        self.num_tasks = 2\n",
    "        \n",
    "        #number of samples i.e number of shots  -number of data points (k) we need to have in each task\n",
    "        self.num_samples = 10\n",
    "\n",
    "        #number of epochs i.e training iterations\n",
    "        self.epochs = 100\n",
    "        \n",
    "        #hyperparameter for the inner loop (inner gradient update)\n",
    "        self.alpha = 0.0001\n",
    "        \n",
    "        #hyperparameter for the outer loop (outer gradient update) i.e meta optimization\n",
    "        self.beta = 0.0001\n",
    "       \n",
    "        #randomly initialize our model parameter theta\n",
    "        self.theta = np.random.normal(size=50).reshape(50, 1)\n",
    "      \n",
    "    #define our sigmoid activation function  \n",
    "    def sigmoid(self,a):\n",
    "        return 1.0 / (1 + np.exp(-a))\n",
    "    \n",
    "    \n",
    "    #now let us get to the interesting part i.e training :P\n",
    "    def train(self):\n",
    "        \n",
    "        #for the number of epochs,\n",
    "        for e in range(self.epochs):        \n",
    "            \n",
    "            self.theta_ = []\n",
    "            \n",
    "            #for storing gradient updates\n",
    "            self.g = []\n",
    "            \n",
    "            #for task i in batch of tasks\n",
    "            for i in range(self.num_tasks):\n",
    "               \n",
    "                #sample k data points and prepare our train set\n",
    "                XTrain, YTrain = sample_points(self.num_samples)\n",
    "                \n",
    "                a = np.matmul(XTrain, self.theta)\n",
    "\n",
    "                YHat = self.sigmoid(a)\n",
    "\n",
    "                #since we are performing classification, we use cross entropy loss as our loss function\n",
    "                loss = ((np.matmul(-YTrain.T, np.log(YHat)) - np.matmul((1 -YTrain.T), np.log(1 - YHat)))/self.num_samples)[0][0]\n",
    "                \n",
    "                #minimize the loss by calculating gradients\n",
    "                gradient = np.matmul(XTrain.T, (YHat - YTrain)) / self.num_samples\n",
    "\n",
    "                #update the gradients and find the optimal parameter theta' for each of tasks\n",
    "                self.theta_.append(self.theta - self.alpha*gradient)\n",
    "                \n",
    "                #compute the gradient update\n",
    "                self.g.append(self.theta-self.theta_[i])\n",
    "                \n",
    "                               \n",
    "           #now we calculate the weights\n",
    "           #we know that weight is the sum of dot product of g_i and g_j divided by a normalization factor. \n",
    "            \n",
    "            normalization_factor = 0\n",
    "            \n",
    "            for i in range(self.num_tasks):\n",
    "                for j in range(self.num_tasks):      \n",
    "                    normalization_factor += np.abs(np.dot(self.g[i].T, self.g[j]))\n",
    "                    \n",
    "            w = np.zeros(self.num_tasks)\n",
    "            \n",
    "            for i in range(self.num_tasks):\n",
    "\n",
    "                for j in range(self.num_tasks):\n",
    "                    w[i] += np.dot(self.g[i].T, self.g[j])\n",
    "\n",
    "                w[i] = w[i] / normalization_factor\n",
    "                \n",
    "                \n",
    "     \n",
    "            #initialize meta gradients\n",
    "            weighted_gradient = np.zeros(self.theta.shape)\n",
    "                        \n",
    "            for i in range(self.num_tasks):\n",
    "            \n",
    "                #sample k data points and prepare our test set for meta training\n",
    "                XTest, YTest = sample_points(10)\n",
    "\n",
    "                #predict the value of y\n",
    "                a = np.matmul(XTest, self.theta_[i])\n",
    "                \n",
    "                YPred = self.sigmoid(a)\n",
    "                           \n",
    "                #compute meta gradients\n",
    "                meta_gradient = np.matmul(XTest.T, (YPred - YTest)) / self.num_samples\n",
    "                \n",
    "                \n",
    "                weighted_gradient += np.sum(w[i]*meta_gradient)\n",
    "\n",
    "  \n",
    "            #update our randomly initialized model parameter theta with the meta gradients\n",
    "            self.theta = self.theta-self.beta*weighted_gradient/self.num_tasks\n",
    "                                       \n",
    "            if e%10==0:\n",
    "                print \"Epoch {}: Loss {}\\n\".format(e,loss)             \n",
    "                print 'Updated Model Parameter Theta\\n'\n",
    "                print 'Sampling Next Batch of Tasks \\n'\n",
    "                print '---------------------------------\\n'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model = GradientAgreement_MAML()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 0: Loss 2.31910705563\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 10: Loss 5.15767994967\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 20: Loss 2.73213692474\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 30: Loss 3.02219229143\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 40: Loss 1.94991368208\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 50: Loss 3.51899454773\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 60: Loss 3.82146653722\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 70: Loss 1.52969798833\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 80: Loss 2.25524267251\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 90: Loss 1.55851499834\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "model.train()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
